How to create a Common Data Model using Mapping Data Flows and connect it to Customer Insights

January 13, 2023

Building a Common Data Model is simple and can be stored in a Lake Store Gen2. This is how I built my own CDM in a Lake, and then connected Dynamics 365 Customer Insights to it.

A Common Data Model is a industry standard, semantically rich way of conforming data to a single standard so that many different types of applications can use it, including the ability to plug it into AI models without much configuration as the metadata and semantic labelling suit Machine Learning.

In this blog I will not go through all the detail of every step. You can view this YouTube video that I made to get a better idea.

Step 1 - Build your Pipeline and call the Mapping Data Flows

You can use Azure Synapse or you can use Azure Data Factory, both have the capability to create Mapping Data Flows.

Your dataflow will have 2 main elements, the ability to take the source CSV data and put it into a Delta Lake, and then moving data from the Delta Lake into the Common Data Model (CDM)

You have the creation and update of the Delta Lake and then you have the Upserts and the Deletes to the CDM

Pipeline

  • I first get the file name and break apart the date that is included in the filename. Exmaple “contacts_20221020_0938.csv”. I take the year which is 2022, and the month which is 10, and the day which is 20.
  • I then set them as variables as you can see in the screenshot where it has the “set variable” activities
  • Then I call a data flow and send the variables with it

Step 2 - Inline sink type to build a Delta Lake

This is the first data flow that is called in the pipeline that you see above.

I have highlighted the “sink” here, you can see that it is an inline sink type. The “sourceContacts” is the dataset that is pointing to the blob storage with the raw CSV files.

The AlterRow1 is just doing a true() for upserts. I think this helps with only pushing new data and updating existing.

Delta lake

Partition it by year, month, day and hour.

In order to do this you would have to create some “derived” columns. You create these columns based on the year, month, day and hour that was passed to this data flow from the pipeline (which was derived from the filename and stored in variables). See below for what that looks like

Derived columns

Onto the Sink activity now. Select the optimise blade. Then set the partition to year, month, day and hour (see the images below). This will make your delta lake look a bit like this

mydeltalake/year_col=2022/month_col=10/day_col-21,hour_col=09/somefileshere.parquet

See how I optimise here to partition it.

Derived columns

Step 3 - Inline sink type to build a Common Data Model (CDM)

Now that you have your delta lake and its partitioned in yyyy mm dd and hh. You can ingest just the partitions that you want. In this screenshot you can see that the “source options” are set to only get the y m d h from the variables that are sent from the pipeline, so in this case it would be 2022 10 20 09. But obviously for everyday going forward that date would change.

The code to put in your folder path is

concat('contactbyymdh','/year_col=',$yearof,'/month_col=',$monthof,'/day_col=',$dayof,'/hour_col=',$hourof)

source options

Settings and code you need for the CDM activity in the data flow

cdm setttings

The “Corpus folder” is just where you have saved your custom cdm.json file (see more below on that topic)

The “Root location” is just where you want your CDM to be saved in your Lake store.

The “Manifest file” is optional to fill in, you can give is name like a I did, but its not necessary.

The json file that is used as your custom logical entitiy is below. Save it as a cdm.json file. You can see that my file was saved as ContactAdornLogical.cdm.json (see above)

You can also use Microsofts CDM github to point to. So instead of chosing the radio button that says custom you would chose “standard” and you would put in

Standard

/core/applicationCommon/Contact.cdm.json/Contact

Custom

{
"jsonSchemaSemanticVersion" : "1.2.0",
"imports" : [
    {
    "corpusPath" : "cdm:/foundations.cdm.json"
    }
],
"definitions" : [
    {
    "entityName" : "Contact",
    "hasAttributes" : [
        {
        "name" : "firstName",
        "dataType" : "string"
        },
        {
        "name" : "lastName",
        "dataType" : "string"
        },
        {
        "name" : "contactId",
        "dataType" : "string"
        },
        {
        "name" : "emailAddress1",
        "dataType" : "string"
        },
        {
        "name" : "phone",
        "dataType" : "bigInteger"
        },
        {
        "name" : "dateofbirth",
        "dataType" : "date"
        },
        {
        "name" : "modifiedOn",
        "dataType" : "dateTime"
        }
        ,
        {
        "name" : "createdOn",
        "dataType" : "dateTime"
        },
        {
        "name":"Gender",
        "dataType": "string"
        }
    ]
    }
]
}

Step 4 - Connect to Dynamics 365 Customer Insights

In Customer Insights you will add a Data Source and chose the Data Lake Gen2, this lake holds your Common Data Model which you created in the previous steps.

All the steps after that are explained in the pages. You have to add yourself and the Service Principal mentioned in this image below as Storage Blob Contributors.

You will chose the manifest file in the pages that follow and you will have to set a primary key, which is your “contactid”.

Once this is loaded then you will do a unify to get this data to show in your Customers area.

customer insights

If you want to do Incremental loads into Customer Insights you must have the right folder structure. You will see that my dataflow has an upserts and a deletes flow. The code to construct the Upserts path is (you will put this in the data flow in Synapse or Data Factory in the Sink activity in the partition path under Root location, see above screenshots in steps 3)

concat('Contact/IncrementalData/',toString($yearof),'/',toString($monthof),'/',toString($dayof),'/',toString($hourof),'/Upserts')

The Deletes path is the same, but ends with “Deletes”

Step 5 - Unify

In order for Customer Insights to know which data is used for its population of Customer records you have to do a “unify” action.

customer insights unify

The steps are quite clear as you go through each page. Just click on next as you go, whilst following the instructions. You will get to the last page which says “create customer profiles”, hit that button and wait for a few minutes while it builds your customers!

When you see this then you have done it, but CI will still be refreshing your “Search and filter” so give it a few minutes.

customer insights unify

And now, that is it, our customers are populated from our Common Data Model. You can run more days of data in.

Great success

customer insights unify


Profile picture

A Blog by Clint Grove who lives in Cambridge and works for Microsoft. Building useful data resources to help bring data to life. Find me on LinkedIn